113 research outputs found

    GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Get PDF
    BACKGROUND: To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. RESULTS: We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. CONCLUSION: GeneBins provides resources to interpret gene expression results from microarray experiments. It is available a

    A systematic comparison of linear regression-based statistical methods to assess exposome-health associations

    No full text
    BACKGROUND: The exposome constitutes a promising framework to better understand the effect of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. OBJECTIVES: We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. METHODS: In a simulation study, we generated 237 exposure covariates with a realistic correlation structure, and a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. RESULTS: On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and a FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm a sensitivity of 80% and a FDP of 33%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%), despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. CONCLUSIONS: Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study are limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. While GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods

    The distribution of genetic diversity in a Brassica oleracea gene bank collection related to the effects on diversity of regeneration, as measured with AFLPs

    Get PDF
    The ex situ conservation of plant genetic resources in gene banks involves the selection of accessions to be conserved and the maintenance of these accessions for current and future users. Decisions concerning both these issues require knowledge about the distribution of genetic diversity within and between accessions sampled from the gene pool, but also about the changes in variation of these samples as a result of regenerations. These issues were studied in an existing gene bank collection of a cross-pollinating crop using a selection of groups of very similar Dutch white cabbage accessions, and additional groups of reference material representing the Dutch, and the global white cabbage gene pool. Six accessions were sampled both before and after a standard regeneration. 30 plants of each of 50 accessions plus 6 regeneration populations included in the study were characterised with AFLPs, using scores for 103 polymorphic bands. It was shown that the genetic changes as a result of standard gene bank regenerations, as measured by AFLPs, are of a comparable magnitude as the differences between some of the more similar accessions. The observed changes are mainly due to highly significant changes in allele frequencies for a few fragments, whereas for the majority of fragments the alleles occur in similar frequencies before and after regeneration. It is argued that, given the changes of accessions over generations, accessions that display similar levels of differentiation may be combined safely

    Semi-supervised discovery of differential genes

    Get PDF
    BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests

    A Platform for Processing Expression of Short Time Series (PESTS)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data.</p> <p>Results</p> <p>To this end, we introduce a platform for Processing Expression of Short Time Series (PESTS). It was designed with a focus on usability and interpretability of analyses for the researcher. As such, it implements several standard techniques for comparability as well as visualization functions. However, it is designed specifically for the unique methods we have developed for significance analysis, multiple test correction and clustering of short time series data. The central tenet of these methods is the use of biologically relevant features for analysis. Features summarize short gene expression profiles, inherently incorporate dependence across time, and allow for both full description of the examined curve and missing data points.</p> <p>Conclusions</p> <p>PESTS is fully generalizable to other types of time series analyses. PESTS implements novel methods as well as several standard techniques for comparability and visualization functions. These features and functionality make PESTS a valuable resource for a researcher's toolkit. PESTS is available to download for free to academic and non-profit users at <url>http://www.mailman.columbia.edu/academic-departments/biostatistics/research-service/software-development</url>.</p

    The efficacy of a comprehensive lifestyle modification programme based on yoga in the management of bronchial asthma: a randomized controlled trial

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is a substantial body of evidence on the efficacy of yoga in the management of bronchial asthma. Many studies have reported, as the effects of yoga on bronchial asthma, significant improvements in pulmonary functions, quality of life and reduction in airway hyper-reactivity, frequency of attacks and medication use. In addition, a few studies have attempted to understand the effects of yoga on exercise-induced bronchoconstriction (EIB) or exercise tolerance capacity. However, none of these studies has investigated any immunological mechanisms by which yoga improves these variables in bronchial asthma.</p> <p>Methods</p> <p>The present randomized controlled trial (RCT) was conducted on 57 adult subjects with mild or moderate bronchial asthma who were allocated randomly to either the yoga (intervention) group (n = 29) or the wait-listed control group (n = 28). The control group received only conventional care and the yoga group received an intervention based on yoga, in addition to the conventional care. The intervention consisted of 2-wk supervised training in lifestyle modification and stress management based on yoga followed by closely monitored continuation of the practices at home for 6-wk. The outcome measures were assessed in both the groups at 0 wk (baseline), 2, 4 and 8 wk by using Generalized Linear Model (GLM) repeated measures followed by post-hoc analysis.</p> <p>Results</p> <p>In the yoga group, there was a steady and progressive improvement in pulmonary function, the change being statistically significant in case of the first second of forced expiratory volume (FEV<sub>1</sub>) at 8 wk, and peak expiratory flow rate (PEFR) at 2, 4 and 8 wk as compared to the corresponding baseline values. There was a significant reduction in EIB in the yoga group. However, there was no corresponding reduction in the urinary prostaglandin D<sub>2 </sub>metabolite (11β prostaglandin F2α) levels in response to the exercise challenge. There was also no significant change in serum eosinophilic cationic protein levels during the 8-wk study period in either group. There was a significant improvement in Asthma Quality of Life (AQOL) scores in both groups over the 8-wk study period. But the improvement was achieved earlier and was more complete in the yoga group. The number-needed-to-treat worked out to be 1.82 for the total AQOL score. An improvement in total AQOL score was greater than the minimal important difference and the same outcome was achieved for the sub-domains of the AQOL. The frequency of rescue medication use showed a significant decrease over the study period in both the groups. However, the decrease was achieved relatively earlier and was more marked in the yoga group than in the control group.</p> <p>Conclusion</p> <p>The present RCT has demonstrated that adding the mind-body approach of yoga to the predominantly physical approach of conventional care results in measurable improvement in subjective as well as objective outcomes in bronchial asthma. The trial supports the efficacy of yoga in the management of bronchial asthma. However, the preliminary efforts made towards working out the mechanism of action of the intervention have not thrown much light on how yoga works in bronchial asthma.</p> <p>Trial registration</p> <p>Current Controlled Trials ISRCTN00815962</p

    Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

    Get PDF
    Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost 7,200inreagentsalone,ourPhenotypeSequencingdesignyieldedthesameinformationvalueforonly7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only 1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only 110110–340

    Structural Relationships between Highly Conserved Elements and Genes in Vertebrate Genomes

    Get PDF
    Large numbers of sequence elements have been identified to be highly conserved among vertebrate genomes. These highly conserved elements (HCEs) are often located in or around genes that are involved in transcription regulation and early development. They have been shown to be involved in cis-regulatory activities through both in vivo and additional computational studies. We have investigated the structural relationships between such elements and genes in six vertebrate genomes human, mouse, rat, chicken, zebrafish and tetraodon and detected several thousand cases of conserved HCE-gene associations, and also cases of HCEs with no common target genes. A few examples underscore the potential significance of our findings about several individual genes. We found that the conserved association between HCE/HCEs and gene/genes are not restricted to elements by their absolute distance on the genome. Notably, long-range associations were identified and the molecular functions of the associated genes do not show any particular overrepresentation of the functional categories previously reported. HCEs in close proximity are found to be linked with different set of gene/genes. The results reflect the highly complex correlation between HCEs and their putative target genes

    Adaptive Strategy for the Statistical Analysis of Connectomes

    Get PDF
    We study an adaptive statistical approach to analyze brain networks represented by brain connection matrices of interregional connectivity (connectomes). Our approach is at a middle level between a global analysis and single connections analysis by considering subnetworks of the global brain network. These subnetworks represent either the inter-connectivity between two brain anatomical regions or by the intra-connectivity within the same brain anatomical region. An appropriate summary statistic, that characterizes a meaningful feature of the subnetwork, is evaluated. Based on this summary statistic, a statistical test is performed to derive the corresponding p-value. The reformulation of the problem in this way reduces the number of statistical tests in an orderly fashion based on our understanding of the problem. Considering the global testing problem, the p-values are corrected to control the rate of false discoveries. Finally, the procedure is followed by a local investigation within the significant subnetworks. We contrast this strategy with the one based on the individual measures in terms of power. We show that this strategy has a great potential, in particular in cases where the subnetworks are well defined and the summary statistics are properly chosen. As an application example, we compare structural brain connection matrices of two groups of subjects with a 22q11.2 deletion syndrome, distinguished by their IQ scores
    corecore